title: "Notes on A/B Testing (Udacity)" author: Yuanzhe Li date: 2020-02 output: pdf_document linkcolor: blue

Notes on the A/B Testing (Udacity) course.

Lesson 1: Overview of A/B Testing

1.15 Calculating confidence interval (CTR example)

1.17 Null and Alternative Hypothesis, Two-tailed vs. One-tailed tests

The null hypothesis and alternative hypothesis proposed here correspond to a two-tailed test, which allows you to distinguish between three cases:

Sometimes when people run A/B tests, they will use a one-tailed test, which only allows you to distinguish between two cases:

Which one you should use depends on what action you will take based on the results.

If you're going to launch the experiment for a statistically significant positive change, and otherwise not, then you don't need to distinguish between a negative result and no result, so a one-tailed test is good enough. If you want to learn the direction of the difference, then a two-tailed test is necessary.

1.19 Pooled Standard Error

1.21 - 24. Sample Size and Power

1.25 Pooled Example

An pooled example is shown below, notice how the dmind_{min} works (need the lower bond of the 1α1-\alpha level CI >dmin=0.02> d_{min} = 0.02)

pooled example

1.26 Confidence Interval Case Breakdown

Shown below is the how we should consider the decision under varying CI and dmind_{min} cases

CI breakdown

Lesson 2: Policy and Ethics for Experiments

2.1 - 2.7. Four Principles

IRB's four main principles to consider when conducting experimentats are:

2.8 Accessing Data Sensitivity

An example of data sensitivity assessment is shown below

accessing data sensitivity

2.10 Summary of Principles

Lesson 3: Choosing and Characterizing Metrics

3.2 - 3.3 Metric Definition Overview

3.5 Refining the Customer Funnel

An example of defining metrics for Udacity

3.6 - 3.7 Quizes on Choosing Metrics

3.8 Other techniques for defining metrics

3.10 - 11 Techniques to Gather Additional Data and Examples

3.10 techniques for getting additional data

3.11 gather data - udacity

3.12 when there is no data

3.13 Metric Definition: Click Through Example

3.16 - 3.17 Summary Metrics

3.18 - 3.19 Sensitivity and Robustness

3.20 Absolute Versus Relative Differences

3.21 - 3.22 Variability

3.24-25 Empirical Variability

Lesson 4: Designing an Experiment

Outline of lesson 4

4.2 - 4.3 Unit of Diversion Overview

4.4 - 4.5 Consistency of Diversion

4.6 - 4.7 Ethical Considerations

4.8 - 4.9 Unity of Analysis vs. Diversion

4.10 Inter- vs. Intra-User Experiments

In an interleaved ranking experiment, suppose you have two ranking algorithms, XX and YY. Algorithm XX would show results X1,X2,XNX_1, X_2, … X_N in that order, and algorithm YY would show Y1,Y2,YNY_1, Y_2, … Y_N. An interleaved experiment would show some interleaving of those results, for example, X1,Y1,X2,Y2,X1, Y_1, X_2, Y_2, … with duplicate results removed. One way to measure this would be by comparing the click-through-rate or -probability of the results from the two algorithms. For more detail, see Large-Scale Validation and Analysis of Interleaved Search Evaluation.

4.11 - 4.13 Target Population, Cohort

4.16 - 4.18 Sizing Examples

4.20 - 22. Duration vs. Exposure

4.23 Learning Effects

Lesson 5: Analyzing Results

Lesson 6: Final Project

Reference